-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: remove orphans #646
feat: remove orphans #646
Conversation
This pull request introduces 1 alert when merging 946c32d into 992120d - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog. |
There's a use case to keep every nth version as snapshots to rebuild any version on the fly by replaying the change set from versiondb or file streamer outputs. I think it should be easy to support the current behaviour. I believe your prune(diff) algorithm is equavalent to the one in #641, adding some small adjustments. def find_orphaned_nodes(v1, v2):
pass
def delete_version(v):
for orphaned_node in find_orphaned_nodes(v, v+1):
delete_node(orphaned_node) Do several adjustments:
def delete_version(v):
predecessor = find_predecessor(v) # default to 0 if no predecessor
successor = find_successor(v) # if there's no successor, we should return error
for orphaned_node in find_orphaned_nodes(v, successor):
if orphaned_node.version <= predecessor:
# this node must be referenced by the predecessor version, so don't delete
else:
delete_node(orphaned_node) The The diff algorithm is just an implementation detail, the main issue is why drop the support for the gap versions, I think it's a useful feature. I believe we should at least discuss these two issues(remove orphan and support gap versions, maybe the slow startup issue as well) separately, they can be solved separately. |
I think startup time issue alone can be solved by lazy load mode. |
iterator.go
Outdated
return | ||
} | ||
node := iter.GetNode() | ||
iter.nodesToVisit = iter.nodesToVisit[:len(iter.nodesToVisit)-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask is this rm last node to allow visit left node if skipped (when left and right node exist in last iteration)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is to pop the visited node from the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it simulates pre-order DFS, so it assumes the root of subtree is already visited and pops it from the stack.
I would argue we should reduce the feature set and off load things like old commitments outside the node. Something like commitment streaming would solve this issue and would reduce the complexity in the code we maintain. Secondly the feature in the sdk of intermittent states was removed a while ago and I don't think anyone has come asking for it back outside of crypto.com I believe. Thirdly, there is a long term objective of trying experiment with different treee algorithms. The more we dice the feature set of iavl the easier it will be to swap trees for research in the future. Also this feature wasn't present on adr40 so surprised we need to keep it here. |
I understand the desire to simplify things here, it's just I don't see supporting this feature make other things too complicated. But yeah, maybe in the long run, people do need to have alternative implementations, as long as we keep the root hashes compatible. |
tbh, I don't like the word |
} | ||
pNode := prevIter.GetNode() | ||
|
||
if orgNode != nil && bytes.Equal(pNode.hash, orgNode.hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure preorder traversal is correct for this, I'm still trying to reason about the correctness of this algorithm, it seems we need an in-order traversal here which is ordered by node keys.
What's important here is curIter
and prevIter
must visit the sequence of orgNode
s in the same order, right?
Or we can keep the set of orgNode
in memory, so we can traverse the trees in whatever order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do need preorder to try to skip the subtree early on, so we need to check twice for a branch node both preorder and inorder.
For example, if the branch node match the current orgNode
, we want to check with preorder to avoid traverse into the subtree.
But if the branch node's left children will match the current orgNode
, the branch node need to be checked again against the new orgNode
, because it could be a match too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-order or in-order are not important here, right? Because we only care about the order of leaves. I don't like keep orgNodes or orphans in memory, those amounts may be enormous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, orgNode could be branch nodes
It seems always deleting versions from the beginning is not optimal, because in that case we always do a full diff without a predecessor to limit the traversal space, just an intuition though, need closer analysis. |
yeah, you are right. It is designed before iterator implementation, I will re-struct it |
It seems like there is no way for now, maybe there will be a way using new node key (version + path) |
With new node key format, we can do the diff with two simple db iterations, and:
|
@cool-develope should we review and merge this before #650 |
no, #650 entirely belongs its implementation. |
@cool-develope can you address the traversal order issue, I used the temporary map approach to extract state changes here but following your diff algorithm in general. |
Say |
no, it is not possible, there is no overlapping between orgNodes, just think as a group of subtree. the orgNodes should be a group of subtree, right? |
Yeah, I guess that make sense. |
I guess the reasoning goes like this:
|
} | ||
pNode := prevIter.GetNode() | ||
|
||
if orgNode != nil && bytes.Equal(pNode.hash, orgNode.hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it more efficient to do an extra pointer comparison here: (pNode == orgNode || bytes.Equal(pNode.hash, orgNode.hash))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, if they are from cache
no, looks good, say again the order method is not important here |
order is important here, if the sequence of |
sorry, you are right, I mean the traversal order is not important, for example in-order traverse of |
yeah, agree, as long as they visit the |
Seems generally OK but I think we should have some tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, audited, LGTM. I still think we should have some basic tests of traverseOprhans
though. Roughly what I did in my branch but more formal, I just compared STDOUT to my expectations. I can commit some of these later this week if you want.
It will probably also be useful to run some benchmark tests on larger data sets validating inputs/outputs at a high level. More like fuzz testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about dropping the support for deleting versions in middle, but other than that, LGTM.
fix test rename try to lower memory usage fix lint
Co-authored-by: Marko <[email protected]>
Co-authored-by: Marko <[email protected]> (cherry picked from commit c90c009)
…#665) Co-authored-by: yihuang <[email protected]>
…osmos#658) (cosmos#665) Co-authored-by: yihuang <[email protected]>
ref: #592, cosmos/cosmos-sdk#12989
Context
We assume the intermediate versions are unnecessary and by keeping the versions in sequence, we don't need to load all versions and we can keep only the
fristVersion
andlatestVersion
. It will resolve #637 .What does this PR do?
DeleteVersion
,DeleteVesions
, andDeleteVersionsRange
DeleteVersionsTo
orphans
from the storageLazyLoadVersion